65 research outputs found

    Abelian-Square-Rich Words

    Full text link
    An abelian square is the concatenation of two words that are anagrams of one another. A word of length nn can contain at most Θ(n2)\Theta(n^2) distinct factors, and there exist words of length nn containing Θ(n2)\Theta(n^2) distinct abelian-square factors, that is, distinct factors that are abelian squares. This motivates us to study infinite words such that the number of distinct abelian-square factors of length nn grows quadratically with nn. More precisely, we say that an infinite word ww is {\it abelian-square-rich} if, for every nn, every factor of ww of length nn contains, on average, a number of distinct abelian-square factors that is quadratic in nn; and {\it uniformly abelian-square-rich} if every factor of ww contains a number of distinct abelian-square factors that is proportional to the square of its length. Of course, if a word is uniformly abelian-square-rich, then it is abelian-square-rich, but we show that the converse is not true in general. We prove that the Thue-Morse word is uniformly abelian-square-rich and that the function counting the number of distinct abelian-square factors of length 2n2n of the Thue-Morse word is 22-regular. As for Sturmian words, we prove that a Sturmian word sαs_{\alpha} of angle α\alpha is uniformly abelian-square-rich if and only if the irrational α\alpha has bounded partial quotients, that is, if and only if sαs_{\alpha} has bounded exponent.Comment: To appear in Theoretical Computer Science. Corrected a flaw in the proof of Proposition

    The Rightmost Equal-Cost Position Problem

    Full text link
    LZ77-based compression schemes compress the input text by replacing factors in the text with an encoded reference to a previous occurrence formed by the couple (length, offset). For a given factor, the smallest is the offset, the smallest is the resulting compression ratio. This is optimally achieved by using the rightmost occurrence of a factor in the previous text. Given a cost function, for instance the minimum number of bits used to represent an integer, we define the Rightmost Equal-Cost Position (REP) problem as the problem of finding one of the occurrences of a factor which cost is equal to the cost of the rightmost one. We present the Multi-Layer Suffix Tree data structure that, for a text of length n, at any time i, it provides REP(LPF) in constant time, where LPF is the longest previous factor, i.e. the greedy phrase, a reference to the list of REP({set of prefixes of LPF}) in constant time and REP(p) in time O(|p| log log n) for any given pattern p

    A Multidimensional Critical Factorization Theorem

    Get PDF
    The Critical Factorization Theorem is one of the principal results in combinatorics on words. It relates local periodicities of a word to its global periodicity. In this paper we give a multidimensional extension of it. More precisely, we give a new proof of the Critical Factorization Theorem, but in a weak form, where the weakness is due to the fact that we loose the tightness of the local repetition order. In exchange, we gain the possibility of extending our proof to the multidimensional case. Indeed, this new proof makes use of the Theorem of Fine and Wilf, that has several classical generalizations to the multidimensional cas

    On the number of factors of Sturmian words

    Get PDF
    AbstractWe prove that for m⩾1, card(Am) = 1+∑mi=1 (m−i+1)ϕ(i) where Am is the set of factors of length m of all the Sturmian words and ϕ is the Euler function. This result was conjectured by Dulucq and Gouyou-Beauchamps (1987) who proved that this result implies that the language (∪m⩾0Am)c is inherently ambiguous. We also give a combinatorial version of the Riemann hypothesis

    Minimal forbidden words and factor automata

    Get PDF
    International audienceLet L(M) be the (factorial) language avoiding a given antifactorial language M. We design an automaton accepting L(M) and built from the language M. The construction is eff ective if M is finite. If M is the set of minimal forbidden words of a single word v, the automaton turns out to be the factor automaton of v (the minimal automaton accepting the set of factors of v). We also give an algorithm that builds the trie of M from the factor automaton of a single word. It yields a non-trivial upper bound on the number of minimal forbidden words of a word

    Text Compression Using Antidictionaries

    Get PDF
    International audienceWe give a new text compression scheme based on Forbidden Words ("antidictionary"). We prove that our algorithms attain the entropy for balanced binary sources. They run in linear time. Moreover, one of the main advantages of this approach is that it produces very fast decompressors. A second advantage is a synchronization property that is helpful to search compressed data and allows parallel compression. Our algorithms can also be presented as "compilers" that create compressors dedicated to any previously fixed source. The techniques used in this paper are from Information Theory and Finite Automata

    Entropy and Compression: A simple proof of an inequality of Khinchin-Ornstein-Shields

    Full text link
    This paper concerns the folklore statement that ``entropy is a lower bound for compression''. More precisely we derive from the entropy theorem a simple proof of a pointwise inequality firstly stated by Ornstein and Shields and which is the almost-sure version of an average inequality firstly stated by Khinchin in 1953. We further give an elementary proof of original Khinchin inequality that can be used as an exercise for Information Theory students and we conclude by giving historical and technical notes of such inequality.Comment: Compared to version 1, in version 2 we added a simpler proof than the one given by Shields of a more general theorem (Theorem 4, pg. 7) presented by Ornstein and Shields. Consequently we also modified the title of the paper. In version 3 we have reordered the sections of the paper, simplified the proof of Theorem 4 (now Theorem 3) and significantly reduced the proof of Theorem 3 (now Theorem 4

    Using Inductive Logic Programming to globally approximate Neural Networks for preference learning: challenges and preliminary results

    Get PDF
    In this paper we explore the use of Answer Set Programming (ASP), and in particular the state-of-the-art Inductive Logic Programming (ILP) system ILASP, as a method to explain black-box models, e.g. Neural Networks (NN), when they are used to learn user preferences. To this aim, we created a dataset of users preferences over a set of recipes, trained a set of NNs on these data, and performed preliminary experiments that investigate how ILASP can globally approximate these NNs. Since computational time required for training ILASP on high dimensional feature spaces is very high, we focused on the problem of making global approximation more scalable. In particular we experimented with the use of Principal Component Analysis (PCA) to reduce the dimensionality of the dataset while trying to keep our explanations transparent
    • …
    corecore